Open In Colab

Emotion AI¶

Nutshell¶

In this project I build a program that classifies emotions from images of human faces, as explained on the course Modern Artificial Intelligence, lectured by Dr. Ryan Ahmed, Ph.D. MBA.

The data set I use is from https://www.kaggle.com/c/facial-keypoints-detection/overview and consists of over 20000 facial images that have been labeled with facial expression/emotion and approximately 2000 images with their keypoint annotations.

The program will train two models which will detect

  1. facial keypoints
  2. detect emotions.

Then these models are combined into one model that will provide the keypoints and the emotion as the output.

A short recap of artificial neuronal networks¶

Artificial neurons are built in a similar way as human neurons. The artificial neurons take in signals through input channels (dendrites in human neurons) and processes information through transfer functions (cell bodies) and generates an output (which would travel through the axon of a neuronal cell).

No description has been provided for this image
No description has been provided for this image

Fig. 1. Side by side view of artificial and biological neurons. Credit: Top image from Introduction to Psychology (A critical approach) Copyright © 2021 by Rose M. Spielman; Kathryn Dumper; William Jenkins; Arlene Lacombe; Marilyn Lovett; and Marion Perlmutter licensed under a Creative Commons Attribution 4.0 International License. Bottom image Chrislb, CC BY-SA 3.0 , via Wikimedia Commons

For example lets consider an artificial neuron (AN) that takes three inputs: $x_1$, $x_2$, and $x_3$. We can then express the output of the artificial neuron mathematically as $y = \phi(X_1W_1 + X_2W_2 + X_3W_3 + b)$. Here $y$ is the output and the $W$s are the weights assigned to each input signal. $b$ is a bias term added to the weighted sum of inputs. $\phi$ is the activation function.

Some common modern activation functions used in neural networks are for example ReLU, GELU and the logistic activation function. ReLU is short for Rectified linear unit function and is defined as $\phi(x) = max(0,\alpha + x'b)$. ReLU is recommended for the hidden layers, since it outputs a linear response for positive values. This helps maintain larger gradients and makes training deep networks more feasible.

The Gaussian Error Linear Unit (GELU) is a smoother version of the ReLU and is defined as $x\phi(x)$, where the $\phi(x)$ stands for Gaussian cumulative distribution function.

The logistic activation function is also called sigmoid function and is defined as $\phi(x) = \frac{1}{1+e^{-x}}$. It takes a number and sets it between 0 and 1 and thus is very helpful in output layers.

No description has been provided for this image

Training¶

All neural networks need to be trained with labeled data. The available data is generally devided to 80% training and 20% testing data. It is also recommended to further divide the training data into an actual training data set (e.g. 60%) and a validation data set (e.g. 20%).

Training is done by adjusting the weights of the network, by iteratively minimising the cost function using for example the gradient descent optimization algorithm. It works by calculating the gradient of the cost function and then takes a step to the negative direction until it reaches the local or global minimum.

A typical choice for a cost function is the quadratic loss, which is formulated as $f_{loss}(w,b)= \frac{1}{N}\sum^n_{i=1}(\hat y-y)$.

Gradient descent algorithm:

1. Calculate the derivative of the loss function $\frac{\delta f_{loss}}{\delta w}$

2. Pick random values for weights and substitute.

3. Calculate the step size, i.e. how much we will update our weights.

step size = learning rate * gradient $=\alpha*\frac{\delta f_{loss}}{\delta w}$

4. Update the parameters and repeat.

new weight = old weight - step size $w_{new}=w_{old}-\alpha*\frac{\delta f_{loss}}{\delta w}$

Below is an example for searching the minimum of a u-shaped funciton with gradient descent. Usually the situation is mulidimensional but the simplification is solved in a similar way.

No description has been provided for this image

Testing various learning rates helps undestand the importance of choosing the parameters of training.

No description has been provided for this image

As shown above too large learning rate can lead to missig the global minimum and/or the model does not converge as quickly. Equally problematic can be too small learning rates when the model does not learn. To solve the problems rising from too small or too large learning rates there are several approaches to adjust the learning rates dynamically.

Momentum is analogous to the balls tendency to keep rolling down hill. Momentum is used to speed up the learning when the error cost gradient is heading in the same direction for a long time, and slow down when a leveled area is reached. Momentum is controlled by a variable that is analogous to the mass of the ball rolling. A large momentum helps avoiding getting stuck in local minima, but might also push through the minima we wish to find. Thus, the parameter has to be selected carefully.

Learning rates can also be adjusted through decay, which basically reduces the learning rate by a certain amount after a fixed number of epochs. It can help solve above like situations, where too great learning rate makes the learning jump back and forth over a minimum.

Adagrad or Adam are examples of popular adaptive algorithms for optimising the gradient descent.

Network architectures¶

The artificial neurons are connected to each other to form neural networks and a plethora of different network architectures exist. To harness the power of AI, it is necessary to know which architecture serves the intended purpose best. Below are three common architectures and their applications.

Recurrent Neural Networks (RNNs) handle sequential data by maintaining a hidden state that captures information about previous elements in the sequence. Therefore they are great for contexts where the output depends on previous inputs, for example time series and natural language processing.

Generative Adversial Networks (GANs) consist of two neural networks - the Generator and the Discriminator. They sparr each other in a zero-sum game framework, where the genrator creates synthetic data that resembles real data and the discriminator evaluates whether it is rela or not. This dirves the generator to output increasingly realistic data. Obviously, this is the choice for many image generation and editing but also for anomaly detection in industiral and security contexts. GANs can model regular patterns and subsequently detect anomalies by comparing generated outputs with real inputs.

Convolutional Neural Networks (CNN) are designed to process data with a grid-like topology and are most commonly used in image analysis. They utilise convolutional layers to learn spatial hierarchies by applying filters (kernels) that slide (convolve) over the input. They usually involve pooling layers that reduce the spatial dimensions and fully connected layers that map the extracted features to outputs.

No description has been provided for this image

Fig. 2. Convolutional neural network. Credit: Aphex34, CC BY-SA 4.0, via Wikimedia Commons

In the Emotion AI, I will use the Residual network (ResNet), which is a Residual Neural Network. Resnet's architecture includes "skip connection" features which enables training very deep networks wihtout vanishing gradient issues. Vanishing gradient problems occurs when the gradient is back-propagated to earlier layers and the resulting gradient is very small.The skip connection feature works by passing the input of one layer to a layer further down in the network. This is also called identity mapping. The ResNet model that I use has been pretrained with the ImagNet dataset.

No description has been provided for this image

Fig. 3. Identity mapping. Credit: LunarLullaby, CC BY-SA 4.0, via Wikimedia Commons

Part 1. Key facial points detection¶

In this section I program the DL model with convolutional neural network and residual blocks to predict facial keypoints. The data set is from https://www.kaggle.com/c/facial-keypoints-detection/overview.

The dataset consists of input images with 15 facial key points each. The training.csv file has 7049 face images with corresponding keypoint locations. The test.csv file has face images only, and will be used to test the model. The images are strings of numbers in the shape of (2140,). That has to be transformed into the real shape of the images (96, 96). Thus we create a 1-D array of the string and reshape it to 2D array.

The model I build will have the architecture presented below. The Resblock consists of two different type of blocks: Convolution block and identity block. As seen below, both of them have an additioinal short path to add the original input to the output. For the Covolution block this includes few extra steps to shape the input to the same dimensions as the output from the longer path.

Final model architecture Resblock architecture
key_points_df['Image'].shape
key_points_df['Image'][0]
type(key_points_df['Image'][0])

key_points_df['Image'] = key_points_df['Image']. apply(lambda img: np.fromstring(img, dtype = int, sep = ' ').reshape(96,96))
key_points_df['Image'][0].shape
(96, 96)
key_points_df.describe()
left_eye_center_x left_eye_center_y right_eye_center_x right_eye_center_y left_eye_inner_corner_x left_eye_inner_corner_y left_eye_outer_corner_x left_eye_outer_corner_y right_eye_inner_corner_x right_eye_inner_corner_y ... nose_tip_x nose_tip_y mouth_left_corner_x mouth_left_corner_y mouth_right_corner_x mouth_right_corner_y mouth_center_top_lip_x mouth_center_top_lip_y mouth_center_bottom_lip_x mouth_center_bottom_lip_y
count 2140.000000 2140.000000 2140.000000 2140.000000 2140.000000 2140.000000 2140.000000 2140.000000 2140.000000 2140.000000 ... 2140.000000 2140.000000 2140.000000 2140.000000 2140.000000 2140.000000 2140.000000 2140.000000 2140.000000 2140.000000
mean 66.221549 36.842274 29.640269 37.063815 59.272128 37.856014 73.412473 37.640110 36.603107 37.920852 ... 47.952141 57.253926 63.419076 75.887660 32.967365 76.134065 48.081325 72.681125 48.149654 82.630412
std 2.087683 2.294027 2.051575 2.234334 2.005631 2.034500 2.701639 2.684162 1.822784 2.009505 ... 3.276053 4.528635 3.650131 4.438565 3.595103 4.259514 2.723274 5.108675 3.032389 4.813557
min 47.835757 23.832996 18.922611 24.773072 41.779381 27.190098 52.947144 26.250023 24.112624 26.250023 ... 24.472590 41.558400 43.869480 57.023258 9.778137 56.690208 32.260312 56.719043 33.047605 57.232296
25% 65.046300 35.468842 28.472224 35.818377 58.113054 36.607950 71.741978 36.102409 35.495730 36.766783 ... 46.495330 54.466000 61.341291 72.874263 30.879288 73.280038 46.580004 69.271669 46.492000 79.417480
50% 66.129065 36.913319 29.655440 37.048085 59.327154 37.845220 73.240045 37.624207 36.620735 37.920336 ... 47.900511 57.638582 63.199057 75.682465 33.034022 75.941985 47.939031 72.395978 47.980854 82.388899
75% 67.332093 38.286438 30.858673 38.333884 60.521492 39.195431 74.978684 39.308331 37.665280 39.143921 ... 49.260657 60.303524 65.302398 78.774969 35.063575 78.884031 49.290000 75.840286 49.551936 85.697976
max 78.013082 46.132421 42.495172 45.980981 69.023030 47.190316 87.032252 49.653825 47.293746 44.887301 ... 65.279654 75.992731 84.767123 94.673637 50.973348 93.443176 61.804506 93.916338 62.438095 95.808983

8 rows × 30 columns

We perform a sanity check for the data by visualising 64 randomly chosen images along with their key facial points.

No description has been provided for this image

Image augmentation¶

Here we create an additional data set where the images are changed slightly to improve the generalisation of the final AI model. We want more data and more variability in e.g. orientation, lighting conditions, or size of the image. This will reduce the likelihood of overfitting and ensuring that the model learns the meaningful "concepts" of emotion recognition. We create this extra data set by creating a copy of the original data set and tweaking it.

I will create 4 types of augmented images:

  1. horisontal flipping
  2. randomly increasing brightness
  3. vertical flipping
  4. rotation with random angle
(4280, 31)
No description has been provided for this image
(6420, 31)
No description has been provided for this image
(8560, 31)
No description has been provided for this image
(10700, 31)
No description has been provided for this image

Data normalization and scaling¶

I normalize the image pixel values to range 0 - 1. This generates better results in neural networks.

# Obtain the x and y coordinates to be used as target
img_target = augmented_df[:,:30]
img_target = np.asarray(img_target).astype(np.float32)
img_target.shape
(10700, 30)
# Split the data into train and test data
X_train_kp, X_test_kp, y_train_kp, y_test_kp = train_test_split(img_array, img_target, test_size=0.2, random_state=42)
X_train_kp.shape
(8560, 96, 96, 1)
X_test_kp.shape
(2140, 96, 96, 1)
y_test_kp.shape
(2140, 30)
y_train_kp.shape
(8560, 30)

Building the Residual Neural Network model for key facial points detection¶

Kernels are used to modify the input by sweeping it over the original input as shown in this animation:

2D Convolution Animation

Fig. 4 Performing a convolution on 6x6 input with a 3x3 kernel using stride 1x1. Credit: Michael Plotke, CC BY-SA 3.0, via Wikimedia Commons.

For example, we could perform a 2D convolution for our input with this command:

X = Conv2D(filters=64, kernel_size=(7,7), strides=(2,2), kernel_initializer = glorot_uniform(seed=0))(X_input)

Here we tell the function that we want to

  • use 64 distinct filters (each one is a trainable 7×7 “weight grid”).
  • use stride 2x2, i.e., the filter jumps 2 pixels at a time, effectively “skipping” every other location.
  • intialise the kernels with glorot_uniform method, aka Xavier uniform initialization. This draws samples from a uniform distribution within a specific range, which will be determined from the number of input and output units.

In this section I define the model architecture using Keras. Below is the code to generate Resblocks.

# @title Resblock

def res_block(X, filter, stage):
  """
  Implementation of the Resblock.

  Arguments:
  X -- input tensor
  filters -- tuple/list of integers, the number of filters for each conv layer (f1, f2, f3)
  stage -- integer, used to name the layers
  block -- string, used to name the layers uniquely within a stage

  Returns:
  X -- output of the res block
  """
  ### 1: Convolutional block###
  # Make a copy of the input
  X_shortcut = X

  f1, f2, f3 = filter

  # ----Long (main) path-----
  # Conv2d
  X = Conv2D(f1, kernel_size = (1,1), strides = (1,1), name=str(stage)+'convblock'+'_conv_a', \
             kernel_initializer = glorot_uniform(seed=0))(X)
  # MaxPool2D
  X = MaxPool2D(pool_size=(2,2))(X)
  # BatchNorm,ReLU
  X = BatchNormalization(axis = 3, name=str(stage)+'convblock'+'_bn_a')(X)
  X = Activation('relu')(X)

  # Conv2D (kernel 3x3)
  X = Conv2D(f2, kernel_size = (3,3), strides = (1,1), padding = 'same', name=str(stage)+'convblock'+'_conv_b', \
            kernel_initializer = glorot_uniform(seed=0))(X)
  # BatchNorm, ReLU
  X = BatchNormalization(axis = 3, name=str(stage)+'convblock'+'_bn_b')(X)
  X = Activation('relu')(X)

  #Conv2D
  X = Conv2D(f3, kernel_size = (1,1), strides = (1,1), name=str(stage)+'convblock'+'_conv_c', \
             kernel_initializer = glorot_uniform(seed=0))(X)
  #BatchNorm, ReLU
  X = BatchNormalization(axis = 3, name=str(stage)+'convblock'+'_bn_c')(X)


  # ----Short path----

  # Conv2D
  X_shortcut = Conv2D(f3, kernel_size = (1,1), strides = (1,1), name=str(stage)+'convblock'+'_conv_short', \
                      kernel_initializer = glorot_uniform(seed=0))(X_shortcut)
  # MaxPool2D and Batchnorm
  X_shortcut = MaxPool2D(pool_size=(2,2))(X_shortcut)
  X_shortcut = BatchNormalization(axis = 3, name=str(stage)+'convblock'+'_bn_short')(X_shortcut)


  # ----Add Paths together----
  X = Add()([X, X_shortcut])
  X = Activation('relu')(X)

  ### 2: Identity block 1 ###
  # Save the input value (shortcut path)
  X_shortcut = X
  block = 'iden1'
  # First component: Conv2D -> BatchNorm -> ReLU
  X = Conv2D(f1, (1, 1), strides=(1, 1), name=str(stage) + block + '_conv_a', \
             kernel_initializer=glorot_uniform(seed=0))(X)
  X = BatchNormalization(axis=3, name=str(stage) + block + '_bn_a')(X)
  X = Activation('relu')(X)

  # Second component: Conv2D (3x3) -> BatchNorm -> ReLU
  X = Conv2D(f2, (3, 3), strides=(1, 1), padding='same', name=str(stage) + block + '_conv_b', \
             kernel_initializer=glorot_uniform(seed=0))(X)
  X = BatchNormalization(axis=3, name=str(stage) + block + '_bn_b')(X)
  X = Activation('relu')(X)

  # Third component: Conv2D (1x1) -> BatchNorm
  X = Conv2D(f3, (1, 1), strides=(1, 1), name=str(stage) + block + '_conv_c', \
             kernel_initializer=glorot_uniform(seed=0))(X)
  X = BatchNormalization(axis=3, name=str(stage) + block + '_bn_c')(X)

  # Add shortcut value to the main path
  X = Add()([X, X_shortcut])
  X = Activation('relu')(X)

  ### 3: Identity block 2 ###
   # Save the input value (shortcut path)
  X_shortcut = X
  block = 'iden2'
  # First component: Conv2D -> BatchNorm -> ReLU
  X = Conv2D(f1, (1, 1), strides=(1, 1), name=str(stage) + block + '_conv_a', \
             kernel_initializer=glorot_uniform(seed=0))(X)
  X = BatchNormalization(axis=3, name=str(stage) + block + '_bn_a')(X)
  X = Activation('relu')(X)

  # Second component: Conv2D (3x3) -> BatchNorm -> ReLU
  X = Conv2D(f2, (3, 3), strides=(1, 1), padding='same', name=str(stage) + block + '_conv_b', \
             kernel_initializer=glorot_uniform(seed=0))(X)
  X = BatchNormalization(axis=3, name=str(stage) + block + '_bn_b')(X)
  X = Activation('relu')(X)

  # Third component: Conv2D (1x1) -> BatchNorm
  X = Conv2D(f3, (1, 1), strides=(1, 1), name=str(stage) + block + '_conv_c', \
             kernel_initializer=glorot_uniform(seed=0))(X)
  X = BatchNormalization(axis=3, name=str(stage) + block + '_bn_c')(X)

  # Add shortcut value to the main path
  X = Add()([X, X_shortcut])
  X = Activation('relu')(X)

  return X

Now that the Resblock is defined we can build the final model.

# @title Final Resnet Neural Network model

input_shape = (96,96,1)

# Input tensor shape
X_input = Input(input_shape)

# Zero-padding
X = ZeroPadding2D((3,3))(X_input)

# Stage 1
X = Conv2D(filters = 64, kernel_size = (7,7), strides = (2,2), name='conv1', \
           kernel_initializer = glorot_uniform(seed=0))(X)
X = BatchNormalization(axis = 3, name = 'bn_conv1')(X)
X = Activation('relu')(X)
X = MaxPooling2D((3,3), strides = (2,2))(X)

# Stage 2
X = res_block(X, filter =  [64, 64, 256], stage = 'res1')

# Stage 3
X = res_block(X, filter = [128,128,512], stage = 'res2')

# We could also add more resblocks if we want
# X = res_block(X, filter= [256,256,1024], stage= 'res3')

# Average pooling
X = AveragePooling2D((2,2), name = 'avg_pool')(X)

# Flatten
X = Flatten()(X)

# Dense, ReLU, Dropout
X = Dense(4096, activation = 'relu')(X)
X = Dropout(0.2)(X)
X = Dense(2048, activation = 'relu')(X)
X = Dropout(0.1)(X)
X = Dense(30, activation = 'relu')(X)

model_1_facialKeyPoints = Model(inputs = X_input, outputs = X)
Model: "functional"
****************************************************************************
┃ Layer (type)        ┃ Output Shape      ┃    Param # ┃ Connected to      ┃
****************************************************************************
| input_layer         | (None, 96, 96, 1) |          0 | -                 |
| (InputLayer)        |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| zero_padding2d      | (None, 102, 102,  |          0 | input_layer[0][0] |
| (ZeroPadding2D)     | 1)                |            |                   |
+---------------------+-------------------+------------+-------------------+
| conv1 (Conv2D)      | (None, 48, 48,    |      3,200 | zero_padding2d[0… |
|                     | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| bn_conv1            | (None, 48, 48,    |        256 | conv1[0][0]       |
| (BatchNormalizatio… | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| activation          | (None, 48, 48,    |          0 | bn_conv1[0][0]    |
| (Activation)        | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| max_pooling2d       | (None, 23, 23,    |          0 | activation[0][0]  |
| (MaxPooling2D)      | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| res1convblock_conv… | (None, 23, 23,    |      4,160 | max_pooling2d[0]… |
| (Conv2D)            | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| max_pooling2d_1     | (None, 11, 11,    |          0 | res1convblock_co… |
| (MaxPooling2D)      | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| res1convblock_bn_a  | (None, 11, 11,    |        256 | max_pooling2d_1[… |
| (BatchNormalizatio… | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| activation_1        | (None, 11, 11,    |          0 | res1convblock_bn… |
| (Activation)        | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| res1convblock_conv… | (None, 11, 11,    |     36,928 | activation_1[0][… |
| (Conv2D)            | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| res1convblock_bn_b  | (None, 11, 11,    |        256 | res1convblock_co… |
| (BatchNormalizatio… | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| activation_2        | (None, 11, 11,    |          0 | res1convblock_bn… |
| (Activation)        | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| res1convblock_conv… | (None, 23, 23,    |     16,640 | max_pooling2d[0]… |
| (Conv2D)            | 256)              |            |                   |
+---------------------+-------------------+------------+-------------------+
| res1convblock_conv… | (None, 11, 11,    |     16,640 | activation_2[0][… |
| (Conv2D)            | 256)              |            |                   |
+---------------------+-------------------+------------+-------------------+
| max_pooling2d_2     | (None, 11, 11,    |          0 | res1convblock_co… |
| (MaxPooling2D)      | 256)              |            |                   |
+---------------------+-------------------+------------+-------------------+
| res1convblock_bn_c  | (None, 11, 11,    |      1,024 | res1convblock_co… |
| (BatchNormalizatio… | 256)              |            |                   |
+---------------------+-------------------+------------+-------------------+
| res1convblock_bn_s… | (None, 11, 11,    |      1,024 | max_pooling2d_2[… |
| (BatchNormalizatio… | 256)              |            |                   |
+---------------------+-------------------+------------+-------------------+
| add (Add)           | (None, 11, 11,    |          0 | res1convblock_bn… |
|                     | 256)              |            | res1convblock_bn… |
+---------------------+-------------------+------------+-------------------+
| activation_3        | (None, 11, 11,    |          0 | add[0][0]         |
| (Activation)        | 256)              |            |                   |
+---------------------+-------------------+------------+-------------------+
| res1iden1_conv_a    | (None, 11, 11,    |     16,448 | activation_3[0][… |
| (Conv2D)            | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| res1iden1_bn_a      | (None, 11, 11,    |        256 | res1iden1_conv_a… |
| (BatchNormalizatio… | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| activation_4        | (None, 11, 11,    |          0 | res1iden1_bn_a[0… |
| (Activation)        | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| res1iden1_conv_b    | (None, 11, 11,    |     36,928 | activation_4[0][… |
| (Conv2D)            | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| res1iden1_bn_b      | (None, 11, 11,    |        256 | res1iden1_conv_b… |
| (BatchNormalizatio… | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| activation_5        | (None, 11, 11,    |          0 | res1iden1_bn_b[0… |
| (Activation)        | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| res1iden1_conv_c    | (None, 11, 11,    |     16,640 | activation_5[0][… |
| (Conv2D)            | 256)              |            |                   |
+---------------------+-------------------+------------+-------------------+
| res1iden1_bn_c      | (None, 11, 11,    |      1,024 | res1iden1_conv_c… |
| (BatchNormalizatio… | 256)              |            |                   |
+---------------------+-------------------+------------+-------------------+
| add_1 (Add)         | (None, 11, 11,    |          0 | res1iden1_bn_c[0… |
|                     | 256)              |            | activation_3[0][… |
+---------------------+-------------------+------------+-------------------+
| activation_6        | (None, 11, 11,    |          0 | add_1[0][0]       |
| (Activation)        | 256)              |            |                   |
+---------------------+-------------------+------------+-------------------+
| res1iden2_conv_a    | (None, 11, 11,    |     16,448 | activation_6[0][… |
| (Conv2D)            | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| res1iden2_bn_a      | (None, 11, 11,    |        256 | res1iden2_conv_a… |
| (BatchNormalizatio… | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| activation_7        | (None, 11, 11,    |          0 | res1iden2_bn_a[0… |
| (Activation)        | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| res1iden2_conv_b    | (None, 11, 11,    |     36,928 | activation_7[0][… |
| (Conv2D)            | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| res1iden2_bn_b      | (None, 11, 11,    |        256 | res1iden2_conv_b… |
| (BatchNormalizatio… | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| activation_8        | (None, 11, 11,    |          0 | res1iden2_bn_b[0… |
| (Activation)        | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| res1iden2_conv_c    | (None, 11, 11,    |     16,640 | activation_8[0][… |
| (Conv2D)            | 256)              |            |                   |
+---------------------+-------------------+------------+-------------------+
| res1iden2_bn_c      | (None, 11, 11,    |      1,024 | res1iden2_conv_c… |
| (BatchNormalizatio… | 256)              |            |                   |
+---------------------+-------------------+------------+-------------------+
| add_2 (Add)         | (None, 11, 11,    |          0 | res1iden2_bn_c[0… |
|                     | 256)              |            | activation_6[0][… |
+---------------------+-------------------+------------+-------------------+
| activation_9        | (None, 11, 11,    |          0 | add_2[0][0]       |
| (Activation)        | 256)              |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2convblock_conv… | (None, 11, 11,    |     32,896 | activation_9[0][… |
| (Conv2D)            | 128)              |            |                   |
+---------------------+-------------------+------------+-------------------+
| max_pooling2d_3     | (None, 5, 5, 128) |          0 | res2convblock_co… |
| (MaxPooling2D)      |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2convblock_bn_a  | (None, 5, 5, 128) |        512 | max_pooling2d_3[… |
| (BatchNormalizatio… |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| activation_10       | (None, 5, 5, 128) |          0 | res2convblock_bn… |
| (Activation)        |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2convblock_conv… | (None, 5, 5, 128) |    147,584 | activation_10[0]… |
| (Conv2D)            |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2convblock_bn_b  | (None, 5, 5, 128) |        512 | res2convblock_co… |
| (BatchNormalizatio… |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| activation_11       | (None, 5, 5, 128) |          0 | res2convblock_bn… |
| (Activation)        |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2convblock_conv… | (None, 11, 11,    |    131,584 | activation_9[0][… |
| (Conv2D)            | 512)              |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2convblock_conv… | (None, 5, 5, 512) |     66,048 | activation_11[0]… |
| (Conv2D)            |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| max_pooling2d_4     | (None, 5, 5, 512) |          0 | res2convblock_co… |
| (MaxPooling2D)      |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2convblock_bn_c  | (None, 5, 5, 512) |      2,048 | res2convblock_co… |
| (BatchNormalizatio… |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2convblock_bn_s… | (None, 5, 5, 512) |      2,048 | max_pooling2d_4[… |
| (BatchNormalizatio… |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| add_3 (Add)         | (None, 5, 5, 512) |          0 | res2convblock_bn… |
|                     |                   |            | res2convblock_bn… |
+---------------------+-------------------+------------+-------------------+
| activation_12       | (None, 5, 5, 512) |          0 | add_3[0][0]       |
| (Activation)        |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2iden1_conv_a    | (None, 5, 5, 128) |     65,664 | activation_12[0]… |
| (Conv2D)            |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2iden1_bn_a      | (None, 5, 5, 128) |        512 | res2iden1_conv_a… |
| (BatchNormalizatio… |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| activation_13       | (None, 5, 5, 128) |          0 | res2iden1_bn_a[0… |
| (Activation)        |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2iden1_conv_b    | (None, 5, 5, 128) |    147,584 | activation_13[0]… |
| (Conv2D)            |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2iden1_bn_b      | (None, 5, 5, 128) |        512 | res2iden1_conv_b… |
| (BatchNormalizatio… |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| activation_14       | (None, 5, 5, 128) |          0 | res2iden1_bn_b[0… |
| (Activation)        |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2iden1_conv_c    | (None, 5, 5, 512) |     66,048 | activation_14[0]… |
| (Conv2D)            |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2iden1_bn_c      | (None, 5, 5, 512) |      2,048 | res2iden1_conv_c… |
| (BatchNormalizatio… |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| add_4 (Add)         | (None, 5, 5, 512) |          0 | res2iden1_bn_c[0… |
|                     |                   |            | activation_12[0]… |
+---------------------+-------------------+------------+-------------------+
| activation_15       | (None, 5, 5, 512) |          0 | add_4[0][0]       |
| (Activation)        |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2iden2_conv_a    | (None, 5, 5, 128) |     65,664 | activation_15[0]… |
| (Conv2D)            |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2iden2_bn_a      | (None, 5, 5, 128) |        512 | res2iden2_conv_a… |
| (BatchNormalizatio… |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| activation_16       | (None, 5, 5, 128) |          0 | res2iden2_bn_a[0… |
| (Activation)        |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2iden2_conv_b    | (None, 5, 5, 128) |    147,584 | activation_16[0]… |
| (Conv2D)            |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2iden2_bn_b      | (None, 5, 5, 128) |        512 | res2iden2_conv_b… |
| (BatchNormalizatio… |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| activation_17       | (None, 5, 5, 128) |          0 | res2iden2_bn_b[0… |
| (Activation)        |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2iden2_conv_c    | (None, 5, 5, 512) |     66,048 | activation_17[0]… |
| (Conv2D)            |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2iden2_bn_c      | (None, 5, 5, 512) |      2,048 | res2iden2_conv_c… |
| (BatchNormalizatio… |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| add_5 (Add)         | (None, 5, 5, 512) |          0 | res2iden2_bn_c[0… |
|                     |                   |            | activation_15[0]… |
+---------------------+-------------------+------------+-------------------+
| activation_18       | (None, 5, 5, 512) |          0 | add_5[0][0]       |
| (Activation)        |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| avg_pool            | (None, 2, 2, 512) |          0 | activation_18[0]… |
| (AveragePooling2D)  |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| flatten (Flatten)   | (None, 2048)      |          0 | avg_pool[0][0]    |
+---------------------+-------------------+------------+-------------------+
| dense (Dense)       | (None, 4096)      |  8,392,704 | flatten[0][0]     |
+---------------------+-------------------+------------+-------------------+
| dropout (Dropout)   | (None, 4096)      |          0 | dense[0][0]       |
+---------------------+-------------------+------------+-------------------+
| dense_1 (Dense)     | (None, 2048)      |  8,390,656 | dropout[0][0]     |
+---------------------+-------------------+------------+-------------------+
| dropout_1 (Dropout) | (None, 2048)      |          0 | dense_1[0][0]     |
+---------------------+-------------------+------------+-------------------+
| dense_2 (Dense)     | (None, 30)        |     61,470 | dropout_1[0][0]   |
└---------------------┴-------------------┴------------┴-------------------┘
 Total params: 18,016,286 (68.73 MB)
 Trainable params: 18,007,710 (68.69 MB)
 Non-trainable params: 8,576 (33.50 KB)


  

Explanations of components¶

The Zeropadding adds a border of zeros (3 pixels wide) around the input image. This will prevent information loss at the edges of convolutions.

Conv2D is the cake of the convolutional layer. It applies the filters to the input image and slides them with a set stride. This way the features are extracted from the image.

The BatchNormalisation layer normalizes the output of the convolution, making training more stable. We can say it is the smooth cream layer on our convolution cake.

The ReLU activation function introduces non-linearity to the model.

MaxpPooling2D reduces the spatial dimensions of the feature maps by taking the maximum value in a window and so downsamples the output. After the Resblock, AveragePooling2D is used similar to MaxPooling, except it calculates the average value within the window. It also reduces the size of the feature maps. Just to give an impression of the impact of pooling, if we removed the MaxPooling 2D layers from Resblocks the final model would have 256 million parameters - instead of 18 million.

Flatten converts the multi-dimensional feature maps into a single, long vector, preparing the data for the fully connected layers.

Dense creates a fully connected layer where each neuron is connected to every neuron in the previous layer. These fully connected layers will process the features exrtacted by the convolutional layers.

Dropout layers are a regularisation technique which drops a set percentage of the neurons during training by setting them to zero. This makes the model less likely to overfit, and decreases the interdependency between the neurons. Therefore we improve the performance of the network and the generalisability of the model.

The final model has a very complex structure, 18 million trainable parameters, which allows it to learn to identify emotions as good or even better than average human. However, too many parameters can lead to problems, such as overfitting and slow or nonconverging training. Optimising this many parameters is not a trivial task.

Compiling and training the model¶

I will use the Adam optimization method for the training. Adam is a computationally efficient stochastic gradient method and it combines the gradient descent with momentum and the RMSP algorithm.

As discussed earlier, the momentum speeds the training by accelerating the gradients by adding a fraction of the previous gradient to the current one. The RMSP or Root Mean Square Propagation is an adaptive learning algorithm that takes the 'exponential moving average' of the gradients. In other words, it adapts the learning rate for each parameter by keeping track of an exponentially decaying average of past squared gradients.

The algortihm will proceed as follows:

1. Calculate the gradient $g_t$

$g_t = \frac{\delta L }{\delta w_t}$

2. Update the Biased first moment estimate $m_t$

$m_t = \beta_1 m_{t-1} + (1-\beta_1)g_t$

This is similar to calculating the momentum as we keep track of the decaying average of past gradients.

3. Update the Biased Second Moment Estimate $v_t$

$v_t = \beta_2 v_{t-1} + (1-\beta_2)g_t^2$

This is similar to RMSP as we keep track of an exponentially decaying average of past squared gradients.

4. Bias correction for $m_t$ and $v_t$

Especially at the beginning of training, $m_t$ and $v_t$ are biased toward zero (because the y are initialised at zero). This is corrected by Adam like this:

$\hat m = \frac{m_t}{1-\beta_1^t}$, $\hat v = \frac{v_t}{1-\beta_2^t}$

5. Parameter update

$w_{t} = w_{t_1} - \alpha_t\frac{\hat m_t}{(v_t+\epsilon)^{1/2}}*g_t$

where,

$g_t$ = gradient of the loss with respect to the parameters at iteration $t$

$\alpha_t$ = learning rate at iteration $t$

$\beta_1, \beta_2$ = decay rates for the moment estimates

$\epsilon$ = small constant to prevent division by zero

The tensorflow tool for Adam optimization accepts several arguments as input:

  • learning_rate: can be a float or a scheduler that optimizes the learning rate

  • beta_1: A value or constant tensor (float) that tells the exponential decay rate for the 1st moment estimates, i.e. the means of the gradients. Default = 0.9.

  • beta_2 = A value or constant tensor (float) that tells the exponential decay rate for the 2nd moment estimates, i.e. the uncentered variance of the squared gradients. Default 0.999.

  • amsgrad = True/False. Wether the AMSGrad variant of the algorithm presented in the paper On the Convergence of Adam and beyond shall be applied. Default = False.

  • weight_decay = If set the weight decay will be set.

Other things to consider when optimising¶

The batch size determines how many training examples are processed before the model's internal parameters are updated. Smaller batch sizes can speed up the training per epoch because the model updates more frequently. However, this can lead to less stable convergence, i.e. the training loss may fluctuate more. A small batch size can be beneficial in case the model is overfitting (the trianing loss is significantly lower than the validation loss).

A larger batch size leads to slower training per epoch and requires moe memory, but can yield more stable updates for the parameters. The model usually converges more smoothly, but might not generalise as well due to "sharp minima".

Another way to tune the parameters of optimization is to use learning rate schedulers. Why? As training progresses, the model gets closer to a good solution. Smaller learning rates allow for finer adjustments to the model's weights, helping it converge to a better minimum without overshooting (see the gradient descent examples in the beginning). I have implemented a learning rate algorithm that reduces the learning rate if the validation loss does not improve in 5 epochs.

After training, the model is saved in a .keras file. The .keras is a zip archive that contains:

  • The architecture
  • The weights
  • The optimizer's status
# @title Compiling and training with 3 epochs
run_example = False
if run_example:
  adam = tf.keras.optimizers.Adam(learning_rate = 0.0001, beta_1 = 0.9, \
                                  beta_2 = 0.999, amsgrad = False)
  model_3_facialKeyPoints = Model(inputs = X_input, outputs = X)
  model_3_facialKeyPoints.compile(loss = "mean_squared_error", optimizer = adam, \
                                  metrics = ['accuracy'])

  #Save the best model with least validation loss here
  checkpoint  = ModelCheckpoint(filepath = "Models/FacialKeyPoints_model_3.keras", \
                                verbose = 1, save_best_only = True)

  history3 = model_3_facialKeyPoints.fit(X_train_kp, y_train_kp, batch_size = 32, \
                    epochs = 3, validation_split = 0.05, callbacks=[checkpoint])
No description has been provided for this image
# @title Compiling and training with batch_size = 64, epochs = 100, and decay on plateu of the learning rate

if retrain_model:

  initial_learning_rate=0.0008

  # compile model
  adam = tf.keras.optimizers.Adam(learning_rate = initial_learning_rate, beta_1 = 0.9, \
                                  beta_2 = 0.999, amsgrad = False)
  model_1_facialKeyPoints = Model(inputs = X_input, outputs = X)
  model_1_facialKeyPoints.compile(loss = "mean_squared_error", optimizer = adam, \
                                  metrics = ['accuracy'])
  # Callbacks: reduce lr on plateau
  reduce_lr = ReduceLROnPlateau(
      monitor='val_loss',
      factor=0.65,
      patience=5,
      min_lr=1e-8,
      verbose=1
  )

  early = EarlyStopping(
    monitor='val_loss',
    patience=12,
    restore_best_weights=True,
    verbose=1,
    mode = 'min'
  )


  # Callbacks: save best model
  checkpoint = ModelCheckpoint(
      filepath="Models/FacialKeyPoints_model_1.keras",
      verbose=1,
      save_best_only=True
  )

  # Callbacks: logs epoch results to CSV
  csv_logger = CSVLogger(
      'Models/training_history_model_1.csv',
      append=True,         # keep adding if file exists
      separator=','        # comma-separated
  )
  # fit with CSVLogger included
  history = model_1_facialKeyPoints.fit(
      X_train_kp, y_train_kp,
      batch_size=64,
      epochs=100,
      validation_split=0.05,
      callbacks=[checkpoint, reduce_lr, csv_logger, early]
  )
print(X_train_kp.shape)   # e.g. (N, 96, 96, 1)
print(y_train_kp.shape)   # should print (N, 30)$
(8560, 96, 96, 1)
(8560, 30)

Assessing the trained key facial points detection model performance¶

# load the model architecture f = final
adam = tf.keras.optimizers.Adam(learning_rate = 0.0001, beta_1 = 0.9, \
                                beta_2 = 0.999, amsgrad = False)
model_1_facialKeyPoints = tf.keras.models.load_model("Models/FacialKeyPoints_model_1.keras")
model_1_facialKeyPoints.compile(loss = "mean_squared_error", optimizer = adam, \
                                metrics = ['accuracy'])
# Evaluate the model
# The model from materials has loss: 8.3705 accuracy: 0.85280377 with the X_test,y_test set.

result = model_1_facialKeyPoints.evaluate(X_test_kp, y_test_kp)
67/67 ━━━━━━━━━━━━━━━━━━━━ 6s 35ms/step - accuracy: 0.8072 - loss: 37.0998
67/67 ━━━━━━━━━━━━━━━━━━━━ 3s 29ms/step
predicted_kp = pd.DataFrame(predicted_kp, columns=columns)
predicted_kp
left_eye_center_x left_eye_center_y right_eye_center_x right_eye_center_y left_eye_inner_corner_x left_eye_inner_corner_y left_eye_outer_corner_x left_eye_outer_corner_y right_eye_inner_corner_x right_eye_inner_corner_y ... nose_tip_x nose_tip_y mouth_left_corner_x mouth_left_corner_y mouth_right_corner_x mouth_right_corner_y mouth_center_top_lip_x mouth_center_top_lip_y mouth_center_bottom_lip_x mouth_center_bottom_lip_y
0 27.774412 41.608402 69.701347 39.645924 36.824764 42.768158 18.683453 43.374584 61.305141 41.332291 ... 51.949940 64.388763 33.121056 88.780327 69.142975 87.274284 51.623299 84.660240 52.020306 94.269218
1 63.867874 59.837021 29.558840 56.851757 57.666470 58.394218 70.214401 59.508846 35.728714 56.771957 ... 48.389038 41.098076 64.282990 24.717127 35.385838 22.573709 49.672440 26.816153 50.343761 18.170223
2 66.701538 38.380432 30.169611 36.045486 60.238499 39.013817 73.072281 39.661930 36.369148 37.416374 ... 47.273544 59.190262 60.052547 82.078575 32.966862 80.368378 46.807598 75.696198 46.202122 88.636078
3 29.851618 40.825123 68.809952 39.051144 37.333893 41.556328 22.312656 42.364140 61.609417 40.434784 ... 49.816231 61.445831 34.389442 85.753059 66.853058 84.536171 50.709244 78.521538 50.523697 94.321213
4 29.966978 34.698601 66.767433 40.271515 37.919559 37.025856 21.968967 34.297169 59.200417 40.350548 ... 47.481728 60.271687 26.436144 72.177620 59.407753 76.562050 43.661575 75.111107 42.907543 80.085808
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2135 69.228844 55.780846 44.195251 25.819244 63.851627 50.836746 72.786087 62.208595 48.127544 32.177284 ... 40.634342 54.915756 34.938332 80.414986 15.325096 55.738548 28.048006 64.707710 19.336086 73.635452
2136 64.919174 63.792625 55.612881 27.871506 62.441170 57.269558 65.713219 70.790627 56.210602 34.523991 ... 39.838898 51.735191 25.973181 69.893890 18.873905 40.596218 25.673395 54.140991 15.005697 58.321358
2137 67.449257 36.665802 29.131422 36.421165 60.840542 36.884842 74.299210 37.797375 35.730961 36.823277 ... 46.935879 49.284603 63.658436 73.521004 32.097244 73.253464 47.337311 65.122261 46.956753 81.887741
2138 67.602493 59.835785 52.191608 26.726240 64.597107 54.131611 69.025040 66.138535 53.892822 32.844337 ... 42.994911 50.666443 32.735664 70.033546 21.022322 42.965286 30.660442 54.207699 20.245888 60.698238
2139 30.817675 35.639484 64.787453 37.358612 37.058781 37.048462 24.390444 36.065876 58.306847 38.046780 ... 45.163116 57.671455 31.936731 73.423302 60.704609 74.940430 45.779903 71.358398 45.488407 81.041817

2140 rows × 30 columns

# @title Printing out samples of predictions
fig, axes = plt.subplots(4,4, figsize=(10,10))
axes = axes.ravel()

for i in range(16):
  axes[i].imshow(X_test_kp[i].reshape(96,96), cmap='gray')
  axes[i].axis('off')
  for j in range(1,31,2):
      axes[i].plot(predicted_kp.iloc[i,j-1],predicted_kp.iloc[i,j], marker='.', color='r')

#plt.tight_layout()
plt.show()
No description has been provided for this image

Part 2. Facial Expression detection¶

In this second part of the project, I train the second model which will classify emotions. The data contains images that belong to 5 categories:

  • 0 = Angry
  • 1 = Disgust
  • 2 = Sad
  • 3 = Happy
  • 4 = Surprise

The images in the data set are of size 48px * 48px. Therefore they need to be resized so that we can run the Expression detection model with the Key facial point detection model together.

Below is an example of an original image, results from resizing and final image after interpolation.

No description has been provided for this image

Visualising the images in the dataset with the emotions¶

No description has been provided for this image
expression_df.head()
emotion pixels
0 0 [[69.316925, 73.03865, 79.13719, 84.17186, 85....
1 0 [[151.09435, 150.91393, 150.65791, 148.96367, ...
2 2 [[23.061905, 25.50914, 29.47847, 33.99843, 36....
3 2 [[20.083221, 19.079437, 17.398712, 17.158691, ...
4 3 [[76.26172, 76.54747, 77.001785, 77.7672, 78.4...

Below is the counts of each emotion category. Our data is extremely unbalanced with very few images portraying disgust and many images within category happy.

No description has been provided for this image

Data preparation and image augmentation¶

X shape (24568, 96, 96, 1)
y shape (24568, 5)
X train shape (22111, 96, 96, 1)
y train shape (22111, 5)
X val shape (1228, 96, 96, 1)
y val shape (1228, 5)
X test shape (1229, 96, 96, 1)
y test shape (1229, 5)

Data preprocessing¶

In the data preprocessing I will again normalize the data and perform image augmentation, as was done in the Part 1. of the project.

First, I normalize the data to conatin values between 0 and 1. Then, I use the following image augmentation techniques:

  1. rotating up to 15 degrees
  2. shifting the image horisontally up to 0.1*image width
  3. shifting the image vertically up to 0.1*image height
  4. shearing the image up to 0.1
  5. zooming the image up to 10 %
  6. horisontally flipping the image
  7. vertically flipping the image
  8. Adjusting the brightness

The spaces outside the boundaries are filled by replicting the nearest pixels.

Build and train Deep Learning model for facial expression classification¶

The model I will build has the following architecture:

%3 cluster_final_model Emotion Detection model input INPUT zeropad Zero padding input->zeropad conv2d Conv2D zeropad->conv2d bn_relu BatchNorm, ReLU conv2d->bn_relu pool MaxPool2D bn_relu->pool Res1 Res-block pool->Res1 Res2 Res-block Res1->Res2 Avgpool AveragePooling2D Res2->Avgpool flatten Flatten() Avgpool->flatten dense1 Dense, ReLU, Dropout flatten->dense1 output OUTPUT dense1->output
# @title Emotion recognition model

input_shape = (96,96,1)

# Input tensor shape
X_input = Input(input_shape)

# Zero-padding
X = ZeroPadding2D((3,3))(X_input)

# Stage 1
X = Conv2D(64, (7,7), strides = (2,2), name = 'conv1', kernel_initializer=glorot_uniform(seed=0))(X)
X = BatchNormalization(axis = 3, name = 'bn1')(X)
X = Activation('relu')(X)
X = MaxPooling2D((3,3), strides = (2,2))(X)

# Stage 2
X = res_block(X, filter = [64,64,256], stage = 'res2')

# Stage 3
X = res_block(X, filter = [128,128,512], stage = 'res3')

# Stage 4 (optional)
#X = res_block(X, filter= [256,256,1024], stage = 'res4')

# Average pooling
X = AveragePooling2D((4,4), name = 'avg_pool')(X)

# Final layer
X = Flatten()(X)
X  = Dense(5, activation = 'softmax', name = 'dense', kernel_initializer=glorot_uniform(seed=0))(X)

Emotion_det_model_2 = Model(inputs = X_input, outputs = X, name = 'Resnet18')
Model: "Resnet18"
****************************************************************************
┃ Layer (type)        ┃ Output Shape      ┃    Param # ┃ Connected to      ┃
****************************************************************************
| input_layer_1       | (None, 96, 96, 1) |          0 | -                 |
| (InputLayer)        |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| zero_padding2d_1    | (None, 102, 102,  |          0 | input_layer_1[0]… |
| (ZeroPadding2D)     | 1)                |            |                   |
+---------------------+-------------------+------------+-------------------+
| conv1 (Conv2D)      | (None, 48, 48,    |      3,200 | zero_padding2d_1… |
|                     | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| bn1                 | (None, 48, 48,    |        256 | conv1[0][0]       |
| (BatchNormalizatio… | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| activation_19       | (None, 48, 48,    |          0 | bn1[0][0]         |
| (Activation)        | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| max_pooling2d_5     | (None, 23, 23,    |          0 | activation_19[0]… |
| (MaxPooling2D)      | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2convblock_conv… | (None, 23, 23,    |      4,160 | max_pooling2d_5[… |
| (Conv2D)            | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| max_pooling2d_6     | (None, 11, 11,    |          0 | res2convblock_co… |
| (MaxPooling2D)      | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2convblock_bn_a  | (None, 11, 11,    |        256 | max_pooling2d_6[… |
| (BatchNormalizatio… | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| activation_20       | (None, 11, 11,    |          0 | res2convblock_bn… |
| (Activation)        | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2convblock_conv… | (None, 11, 11,    |     36,928 | activation_20[0]… |
| (Conv2D)            | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2convblock_bn_b  | (None, 11, 11,    |        256 | res2convblock_co… |
| (BatchNormalizatio… | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| activation_21       | (None, 11, 11,    |          0 | res2convblock_bn… |
| (Activation)        | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2convblock_conv… | (None, 23, 23,    |     16,640 | max_pooling2d_5[… |
| (Conv2D)            | 256)              |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2convblock_conv… | (None, 11, 11,    |     16,640 | activation_21[0]… |
| (Conv2D)            | 256)              |            |                   |
+---------------------+-------------------+------------+-------------------+
| max_pooling2d_7     | (None, 11, 11,    |          0 | res2convblock_co… |
| (MaxPooling2D)      | 256)              |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2convblock_bn_c  | (None, 11, 11,    |      1,024 | res2convblock_co… |
| (BatchNormalizatio… | 256)              |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2convblock_bn_s… | (None, 11, 11,    |      1,024 | max_pooling2d_7[… |
| (BatchNormalizatio… | 256)              |            |                   |
+---------------------+-------------------+------------+-------------------+
| add_6 (Add)         | (None, 11, 11,    |          0 | res2convblock_bn… |
|                     | 256)              |            | res2convblock_bn… |
+---------------------+-------------------+------------+-------------------+
| activation_22       | (None, 11, 11,    |          0 | add_6[0][0]       |
| (Activation)        | 256)              |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2iden1_conv_a    | (None, 11, 11,    |     16,448 | activation_22[0]… |
| (Conv2D)            | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2iden1_bn_a      | (None, 11, 11,    |        256 | res2iden1_conv_a… |
| (BatchNormalizatio… | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| activation_23       | (None, 11, 11,    |          0 | res2iden1_bn_a[0… |
| (Activation)        | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2iden1_conv_b    | (None, 11, 11,    |     36,928 | activation_23[0]… |
| (Conv2D)            | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2iden1_bn_b      | (None, 11, 11,    |        256 | res2iden1_conv_b… |
| (BatchNormalizatio… | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| activation_24       | (None, 11, 11,    |          0 | res2iden1_bn_b[0… |
| (Activation)        | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2iden1_conv_c    | (None, 11, 11,    |     16,640 | activation_24[0]… |
| (Conv2D)            | 256)              |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2iden1_bn_c      | (None, 11, 11,    |      1,024 | res2iden1_conv_c… |
| (BatchNormalizatio… | 256)              |            |                   |
+---------------------+-------------------+------------+-------------------+
| add_7 (Add)         | (None, 11, 11,    |          0 | res2iden1_bn_c[0… |
|                     | 256)              |            | activation_22[0]… |
+---------------------+-------------------+------------+-------------------+
| activation_25       | (None, 11, 11,    |          0 | add_7[0][0]       |
| (Activation)        | 256)              |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2iden2_conv_a    | (None, 11, 11,    |     16,448 | activation_25[0]… |
| (Conv2D)            | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2iden2_bn_a      | (None, 11, 11,    |        256 | res2iden2_conv_a… |
| (BatchNormalizatio… | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| activation_26       | (None, 11, 11,    |          0 | res2iden2_bn_a[0… |
| (Activation)        | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2iden2_conv_b    | (None, 11, 11,    |     36,928 | activation_26[0]… |
| (Conv2D)            | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2iden2_bn_b      | (None, 11, 11,    |        256 | res2iden2_conv_b… |
| (BatchNormalizatio… | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| activation_27       | (None, 11, 11,    |          0 | res2iden2_bn_b[0… |
| (Activation)        | 64)               |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2iden2_conv_c    | (None, 11, 11,    |     16,640 | activation_27[0]… |
| (Conv2D)            | 256)              |            |                   |
+---------------------+-------------------+------------+-------------------+
| res2iden2_bn_c      | (None, 11, 11,    |      1,024 | res2iden2_conv_c… |
| (BatchNormalizatio… | 256)              |            |                   |
+---------------------+-------------------+------------+-------------------+
| add_8 (Add)         | (None, 11, 11,    |          0 | res2iden2_bn_c[0… |
|                     | 256)              |            | activation_25[0]… |
+---------------------+-------------------+------------+-------------------+
| activation_28       | (None, 11, 11,    |          0 | add_8[0][0]       |
| (Activation)        | 256)              |            |                   |
+---------------------+-------------------+------------+-------------------+
| res3convblock_conv… | (None, 11, 11,    |     32,896 | activation_28[0]… |
| (Conv2D)            | 128)              |            |                   |
+---------------------+-------------------+------------+-------------------+
| max_pooling2d_8     | (None, 5, 5, 128) |          0 | res3convblock_co… |
| (MaxPooling2D)      |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| res3convblock_bn_a  | (None, 5, 5, 128) |        512 | max_pooling2d_8[… |
| (BatchNormalizatio… |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| activation_29       | (None, 5, 5, 128) |          0 | res3convblock_bn… |
| (Activation)        |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| res3convblock_conv… | (None, 5, 5, 128) |    147,584 | activation_29[0]… |
| (Conv2D)            |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| res3convblock_bn_b  | (None, 5, 5, 128) |        512 | res3convblock_co… |
| (BatchNormalizatio… |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| activation_30       | (None, 5, 5, 128) |          0 | res3convblock_bn… |
| (Activation)        |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| res3convblock_conv… | (None, 11, 11,    |    131,584 | activation_28[0]… |
| (Conv2D)            | 512)              |            |                   |
+---------------------+-------------------+------------+-------------------+
| res3convblock_conv… | (None, 5, 5, 512) |     66,048 | activation_30[0]… |
| (Conv2D)            |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| max_pooling2d_9     | (None, 5, 5, 512) |          0 | res3convblock_co… |
| (MaxPooling2D)      |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| res3convblock_bn_c  | (None, 5, 5, 512) |      2,048 | res3convblock_co… |
| (BatchNormalizatio… |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| res3convblock_bn_s… | (None, 5, 5, 512) |      2,048 | max_pooling2d_9[… |
| (BatchNormalizatio… |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| add_9 (Add)         | (None, 5, 5, 512) |          0 | res3convblock_bn… |
|                     |                   |            | res3convblock_bn… |
+---------------------+-------------------+------------+-------------------+
| activation_31       | (None, 5, 5, 512) |          0 | add_9[0][0]       |
| (Activation)        |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| res3iden1_conv_a    | (None, 5, 5, 128) |     65,664 | activation_31[0]… |
| (Conv2D)            |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| res3iden1_bn_a      | (None, 5, 5, 128) |        512 | res3iden1_conv_a… |
| (BatchNormalizatio… |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| activation_32       | (None, 5, 5, 128) |          0 | res3iden1_bn_a[0… |
| (Activation)        |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| res3iden1_conv_b    | (None, 5, 5, 128) |    147,584 | activation_32[0]… |
| (Conv2D)            |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| res3iden1_bn_b      | (None, 5, 5, 128) |        512 | res3iden1_conv_b… |
| (BatchNormalizatio… |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| activation_33       | (None, 5, 5, 128) |          0 | res3iden1_bn_b[0… |
| (Activation)        |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| res3iden1_conv_c    | (None, 5, 5, 512) |     66,048 | activation_33[0]… |
| (Conv2D)            |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| res3iden1_bn_c      | (None, 5, 5, 512) |      2,048 | res3iden1_conv_c… |
| (BatchNormalizatio… |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| add_10 (Add)        | (None, 5, 5, 512) |          0 | res3iden1_bn_c[0… |
|                     |                   |            | activation_31[0]… |
+---------------------+-------------------+------------+-------------------+
| activation_34       | (None, 5, 5, 512) |          0 | add_10[0][0]      |
| (Activation)        |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| res3iden2_conv_a    | (None, 5, 5, 128) |     65,664 | activation_34[0]… |
| (Conv2D)            |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| res3iden2_bn_a      | (None, 5, 5, 128) |        512 | res3iden2_conv_a… |
| (BatchNormalizatio… |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| activation_35       | (None, 5, 5, 128) |          0 | res3iden2_bn_a[0… |
| (Activation)        |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| res3iden2_conv_b    | (None, 5, 5, 128) |    147,584 | activation_35[0]… |
| (Conv2D)            |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| res3iden2_bn_b      | (None, 5, 5, 128) |        512 | res3iden2_conv_b… |
| (BatchNormalizatio… |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| activation_36       | (None, 5, 5, 128) |          0 | res3iden2_bn_b[0… |
| (Activation)        |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| res3iden2_conv_c    | (None, 5, 5, 512) |     66,048 | activation_36[0]… |
| (Conv2D)            |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| res3iden2_bn_c      | (None, 5, 5, 512) |      2,048 | res3iden2_conv_c… |
| (BatchNormalizatio… |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| add_11 (Add)        | (None, 5, 5, 512) |          0 | res3iden2_bn_c[0… |
|                     |                   |            | activation_34[0]… |
+---------------------+-------------------+------------+-------------------+
| activation_37       | (None, 5, 5, 512) |          0 | add_11[0][0]      |
| (Activation)        |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| avg_pool            | (None, 1, 1, 512) |          0 | activation_37[0]… |
| (AveragePooling2D)  |                   |            |                   |
+---------------------+-------------------+------------+-------------------+
| flatten_1 (Flatten) | (None, 512)       |          0 | avg_pool[0][0]    |
+---------------------+-------------------+------------+-------------------+
| dense (Dense)       | (None, 5)         |      2,565 | flatten_1[0][0]   |
└---------------------┴-------------------┴------------┴-------------------┘
 Total params: 1,174,021 (4.48 MB)
 Trainable params: 1,165,445 (4.45 MB)
 Non-trainable params: 8,576 (33.50 KB)


  
print(f"Training samples: {len(X_train_ed)}")
print(f"Batch size: {64}")
steps_per_epoch=np.ceil(len(X_train_ed) / 64).astype(int)
print(f"Steps per epoch: {steps_per_epoch}")
Training samples: 22111
Batch size: 64
Steps per epoch: 346

Evaluate model¶

Confusion matrix, accuracy, precision, and recall

<matplotlib.legend.Legend at 0x7d7a3e15ce50>
No description has been provided for this image
39/39 ━━━━━━━━━━━━━━━━━━━━ 5s 51ms/step - accuracy: 0.7534 - loss: 0.5960
39/39 ━━━━━━━━━━━━━━━━━━━━ 4s 46ms/step
Text(70.72222222222221, 0.5, 'True')
No description has been provided for this image
No description has been provided for this image
print(classification_report(true_classes, predicted_classes))
              precision    recall  f1-score   support

           0       0.68      0.65      0.66       245
           1       0.46      0.27      0.34        22
           2       0.62      0.72      0.67       319
           3       0.86      0.84      0.85       458
           4       0.87      0.77      0.81       185

    accuracy                           0.75      1229
   macro avg       0.70      0.65      0.67      1229
weighted avg       0.76      0.75      0.75      1229

The above table tells us that the classes where we had the least data (# support) have the weakest performance. Precision (percentage of samples predicted to be class x that are actually x) and recall (percentage of x samples in data that are correctly labeled as x) are highest in class 3 where we also had the most samples. f1 -score is the harmonic mean of precision and recall and it is calculated as

$F_1 = \frac{\text{precision} \ \times \ \text{recall}}{\text{precision} \ +\ \text{recall}}$

Part 3. Combining the key point detection and facial expression recognition models¶

df_predict = predict(X_test_ed)
39/39 ━━━━━━━━━━━━━━━━━━━━ 1s 30ms/step
39/39 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step
df_predict.head()
left_eye_center_x left_eye_center_y right_eye_center_x right_eye_center_y left_eye_inner_corner_x left_eye_inner_corner_y left_eye_outer_corner_x left_eye_outer_corner_y right_eye_inner_corner_x right_eye_inner_corner_y ... nose_tip_y mouth_left_corner_x mouth_left_corner_y mouth_right_corner_x mouth_right_corner_y mouth_center_top_lip_x mouth_center_top_lip_y mouth_center_bottom_lip_x mouth_center_bottom_lip_y emotion
0 66.610451 40.496799 36.849854 25.179127 59.685787 38.832348 72.507202 43.841587 42.234669 28.970413 ... 51.316212 49.191910 70.809021 23.674337 58.018814 36.823456 64.206169 32.734856 67.690872 3
1 64.653389 38.042404 29.781794 34.580994 57.754307 38.556492 71.875885 39.505543 36.547447 36.649067 ... 59.056892 58.322350 77.191147 28.911465 74.403419 43.987125 74.182167 43.264114 80.884209 0
2 60.960011 37.275005 33.889050 33.123466 55.079693 37.744541 66.276917 38.548401 38.508617 35.537212 ... 59.973900 51.810165 79.506187 30.799669 76.300369 40.469513 78.190208 40.058804 79.434502 2
3 56.973568 37.822292 25.538334 38.267265 50.197941 38.189533 64.570465 37.789585 32.019379 38.396927 ... 44.175758 59.712654 48.962036 30.902287 48.933670 44.473202 48.845013 45.226276 49.808178 3
4 62.833557 40.498875 31.723820 37.697025 56.710606 41.063759 69.142136 41.633850 37.679867 39.783356 ... 58.978386 57.483547 75.107430 30.617802 73.028336 44.611389 72.902473 44.198177 77.997261 0

5 rows × 31 columns

Plotting test images of the combined models.

No description has been provided for this image